Driver behavior risk assessment and pedestrian awareness

ABSTRACT

Driver behavior risk assessment and pedestrian awareness may include an receiving an input stream of images of an environment including one or more objects within the environment, estimating an intention of an ego vehicle based on the input stream of images and a temporal recurrent network (TRN), generating a scene representation based on the input stream of images and a graph neural network (GNN), generating a prediction of a situation based on the scene representation and the intention of the ego vehicle, and generating an influenced or non-influenced action determination based on the prediction of the situation and the scene representation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Non-Provisional patentapplication Ser. No. 17/162,054 (Attorney Docket No. HRA-49270) entitled“DRIVER BEHAVIOR RISK ASSESSMENT AND PEDESTRIAN AWARENESS”, filed onJan. 29, 2021, which claims the benefit of U.S. Provisional PatentApplication, Ser. No. 63/113,150, filed on Nov. 12, 2020; the entiretyof the above-noted application is incorporated by reference herein.

BACKGROUND

Modeling driver behavior is still an open research problem. Driverbehavior may be complicated as it involves low-level operationalcontrolling (e.g., vehicle velocity/acceleration, throttle/brakeposition, and lateral acceleration) and high level cognitive processing(e.g., the prediction of driving maneuvers, driver intent and state,traffic participants' intention, and environmental factors). At thecognitive level, the driver may first identify relevant elements thatimpact their navigation in the scene. Second, the driver may reasonabout the interconnections between these elements, and third, the drivermay infer the future actions of the traffic participants. Modeling sucha thought process has proven challenging because a driver's perceptionof risk is a complex cognitive process that is largely manifested by thevoluntary response of the driver to external stimuli as well as theapparent attentiveness of participants toward the ego-vehicle.

BRIEF DESCRIPTION

According to one aspect, a system for driver behavior risk assessmentand pedestrian awareness may include an image sensor receiving an inputstream of images of an environment including one or more objects withinthe environment, an intention estimator estimating an intention of anego vehicle based on the input stream of images and a temporal recurrentnetwork (TRN), a scene representation generator generating a scenerepresentation based on the input stream of images and a graph neuralnetwork (GNN), a situation predictor generating a prediction of asituation based on the scene representation and the intention of the egovehicle, and a driver response determiner generating an influenced ornon-influenced action determination based on the prediction of thesituation and the scene representation.

The intention of the ego vehicle may be estimated as a left-turnintention, a right-turn intention, or a straight-travel intention. Theenvironment may include a straight topology, a three-way intersectiontopology, or a four-way intersection topology. The situation may includea stop sign, a traffic light, a crossing pedestrian, a crossing vehicle,a vehicle blocking ego lane, a congestion, a jaywalking, a vehiclebacking into parking space, a vehicle on shoulder open door, or acut-in. The system for driver behavior risk assessment and pedestrianawareness may include a risk object identifier (ROI) extractingimage-level and object-level features from the input stream of images ofthe environment. The ROI may determine one or more object bounding boxesfor one or more of the objects within the environment. One or more ofthe object bounding boxes may be around a face or a head of a pedestrianand the ROI determines whether the pedestrian is looking or is notlooking at the ego vehicle. The situation predictor may generate theprediction of the situation based on an element-wise dot product of thescene representation and the intention of the ego vehicle. The driverresponse determiner may generate the influenced or non-influenced actiondetermination based on passing the prediction of the situation through amultilayer perceptron (MLP) and the scene representation. The situationpredictor may classify the prediction of the situation into a binaryclass.

According to one aspect, a method for driver behavior risk assessmentand pedestrian awareness may include receiving an input stream of imagesof an environment including one or more objects within the environment,estimating an intention of an ego vehicle based on the input stream ofimages and a temporal recurrent network (TRN), generating a scenerepresentation based on the input stream of images and a graph neuralnetwork (GNN), generating a prediction of a situation based on the scenerepresentation and the intention of the ego vehicle, and generating aninfluenced or non-influenced action determination based on theprediction of the situation and the scene representation.

The intention of the ego vehicle may be estimated as a left-turnintention, a right-turn intention, or a straight-travel intention. Theenvironment may include a straight topology, a three-way intersectiontopology, or a four-way intersection topology. The situation may includea stop sign, a traffic light, a crossing pedestrian, a crossing vehicle,a vehicle blocking ego lane, a congestion, a jaywalking, a vehiclebacking into parking space, a vehicle on shoulder open door, or acut-in. The method for driver behavior risk assessment and pedestrianawareness may include extracting image-level and object-level featuresfrom the input stream of images of the environment.

The method for driver behavior risk assessment and pedestrian awarenessmay include determining one or more object bounding boxes for one ormore of the objects within the environment. The method for driverbehavior risk assessment and pedestrian awareness may includedetermining whether a pedestrian is looking or is not looking at the egovehicle, and one or more of the object bounding boxes may be around aface or a head of the pedestrian. The method for driver behavior riskassessment and pedestrian awareness may include generating theprediction of the situation based on an element-wise dot product of thescene representation and the intention of the ego vehicle. The methodfor driver behavior risk assessment and pedestrian awareness may includegenerating the influenced or non-influenced action determination basedon passing the prediction of the situation through a multilayerperceptron (MLP) and the scene representation.

A driver behavior risk assessment and pedestrian awareness vehicle mayinclude an image sensor receiving an input stream of images of anenvironment including one or more objects within the environment, anintention estimator estimating an intention of the vehicle based on theinput stream of images and a temporal recurrent network (TRN), a scenerepresentation generator generating a scene representation based on theinput stream of images and a graph neural network (GNN), a situationpredictor generating a prediction of a situation based on the scenerepresentation and the intention of the vehicle, and a driver responsedeterminer generating an influenced or non-influenced actiondetermination based on the prediction of the situation and the scenerepresentation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an exemplary component diagram of a system for driver behaviorrisk assessment and pedestrian awareness, according to one aspect.

FIG. 2 is an exemplary component diagram of a system for driver behaviorrisk assessment and pedestrian awareness, according to one aspect.

FIG. 3 is an exemplary diagram of factors associated with driverbehavior risk assessment and pedestrian awareness, according to oneaspect.

FIG. 4 is an exemplary diagram of external stimuli associated withdriver behavior risk assessment and pedestrian awareness, according toone aspect.

FIG. 5 is an exemplary flow diagram of a method for driver behavior riskassessment and pedestrian awareness, according to one aspect.

FIG. 6 is an illustration of an example computer-readable medium orcomputer-readable device including processor-executable instructionsconfigured to embody one or more of the provisions set forth herein,according to one aspect.

FIG. 7 is an illustration of an example computing environment where oneor more of the provisions set forth herein are implemented, according toone aspect.

DETAILED DESCRIPTION

The following includes definitions of selected terms employed herein.The definitions include various examples and/or forms of components thatfall within the scope of a term and that may be used for implementation.The examples are not intended to be limiting. Further, one havingordinary skill in the art will appreciate that the components discussedherein, may be combined, omitted or organized with other components ororganized into different architectures.

A “processor”, as used herein, processes signals and performs generalcomputing and arithmetic functions. Signals processed by the processormay include digital signals, data signals, computer instructions,processor instructions, messages, a bit, a bit stream, or other meansthat may be received, transmitted, and/or detected. Generally, theprocessor may be a variety of various processors including multiplesingle and multicore processors and co-processors and other multiplesingle and multicore processor and co-processor architectures. Theprocessor may include various modules to execute various functions.

A “memory”, as used herein, may include volatile memory and/ornon-volatile memory. Non-volatile memory may include, for example, ROM(read only memory), PROM (programmable read only memory), EPROM(erasable PROM), and EEPROM (electrically erasable PROM). Volatilememory may include, for example, RAM (random access memory), synchronousRAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double datarate SDRAM (DDRSDRAM), and direct RAM bus RAM (DRRAM). The memory maystore an operating system that controls or allocates resources of acomputing device.

A “disk” or “drive”, as used herein, may be a magnetic disk drive, asolid state disk drive, a floppy disk drive, a tape drive, a Zip drive,a flash memory card, and/or a memory stick. Furthermore, the disk may bea CD-ROM (compact disk ROM), a CD recordable drive (CD-R drive), a CDrewritable drive (CD-RW drive), and/or a digital video ROM drive(DVD-ROM). The disk may store an operating system that controls orallocates resources of a computing device.

A “bus”, as used herein, refers to an interconnected architecture thatis operably connected to other computer components inside a computer orbetween computers. The bus may transfer data between the computercomponents. The bus may be a memory bus, a memory controller, aperipheral bus, an external bus, a crossbar switch, and/or a local bus,among others. The bus may also be a vehicle bus that interconnectscomponents inside a vehicle using protocols such as Media OrientedSystems Transport (MOST), Controller Area network (CAN), LocalInterconnect Network (LIN), among others.

A “database”, as used herein, may refer to a table, a set of tables, anda set of data stores (e.g., disks) and/or methods for accessing and/ormanipulating those data stores.

An “operable connection”, or a connection by which entities are“operably connected”, is one in which signals, physical communications,and/or logical communications may be sent and/or received. An operableconnection may include a wireless interface, a physical interface, adata interface, and/or an electrical interface.

A “computer communication”, as used herein, refers to a communicationbetween two or more computing devices (e.g., computer, personal digitalassistant, cellular telephone, network device) and may be, for example,a network transfer, a file transfer, an applet transfer, an email, ahypertext transfer protocol (HTTP) transfer, and so on. A computercommunication may occur across, for example, a wireless system (e.g.,IEEE 802.11), an Ethernet system (e.g., IEEE 802.3), a token ring system(e.g., IEEE 802.5), a local area network (LAN), a wide area network(WAN), a point-to-point system, a circuit switching system, a packetswitching system, among others.

A “mobile device”, as used herein, may be a computing device typicallyhaving a display screen with a user input (e.g., touch, keyboard) and aprocessor for computing. Mobile devices include handheld devices,portable electronic devices, smart phones, laptops, tablets, ande-readers.

A “vehicle”, as used herein, refers to any moving vehicle that iscapable of carrying one or more human occupants and is powered by anyform of energy. The term “vehicle” includes cars, trucks, vans,minivans, SUVs, motorcycles, scooters, boats, personal watercraft, andaircraft. In some scenarios, a motor vehicle includes one or moreengines. Further, the term “vehicle” may refer to an electric vehicle(EV) that is powered entirely or partially by one or more electricmotors powered by an electric battery. The EV may include batteryelectric vehicles (BEV) and plug-in hybrid electric vehicles (PHEV).Additionally, the term “vehicle” may refer to an autonomous vehicleand/or self-driving vehicle powered by any form of energy. Theautonomous vehicle may or may not carry one or more human occupants.

A “vehicle system”, as used herein, may be any automatic or manualsystems that may be used to enhance the vehicle, and driving. Exemplaryvehicle systems include an autonomous driving system, an electronicstability control system, an anti-lock brake system, a brake assistsystem, an automatic brake prefill system, a low speed follow system, acruise control system, a collision warning system, a collisionmitigation braking system, an auto cruise control system, a lanedeparture warning system, a blind spot indicator system, a lane keepassist system, a navigation system, a transmission system, brake pedalsystems, an electronic power steering system, visual devices (e.g.,camera systems, proximity sensor systems), a climate control system, anelectronic pretensioning system, a monitoring system, a passengerdetection system, a vehicle suspension system, a vehicle seatconfiguration system, a vehicle cabin lighting system, an audio system,a sensory system, among others.

The aspects discussed herein may be described and implemented in thecontext of non-transitory computer-readable storage medium storingcomputer-executable instructions. Non-transitory computer-readablestorage media include computer storage media and communication media.For example, flash memory drives, digital versatile discs (DVDs),compact discs (CDs), floppy disks, and tape cassettes. Non-transitorycomputer-readable storage media may include volatile and non-volatile,removable and non-removable media implemented in any method ortechnology for storage of information such as computer readableinstructions, data structures, modules, or other data.

The present disclosure relates to risk assessment and pedestrianawareness towards driver behavior understanding. Risk may be formulatedfrom a driver-centric perspective to identify road agents that influencethe driver in risky situations. A data set having annotations of driverintention (e.g., go straight), scenarios (e.g., a jay-walker is crossingthe street), decision of driver maneuver (e.g., slow down), roadtopology of the scene (e.g., a 4-way intersection) and pedestrianawareness (e.g., looking or not looking) using face annotations whenpedestrians are present is provided. These additional road topologiesmay be coupled with ego car interactions. Annotations may also includeyields or original maneuvers. For risk assessment, a risk objectidentification framework is provided that explicitly models the causalrelationship of driver intention, scenario, and decision of drivermaneuver.

The scenario, ego-intention, and stimulus may be considered whenrelationships are formed. In one example, and when an ego vehicleapproaches an intersection, an ego intention may be fixed, whatobstacles are in the path, and what influences from traffic agents aredefined. From this, risk objects may be understood and a “Stop” or “Go”may be determined.

Risk objects may be considered as a cause and effect problem. Given asequence of video frames observed in the past, the model may first parseeach frame into objects of interest, each of which is encoded to afeature vector. An egocentric spatiotemporal graph may be constructedusing these features as node representations to produce a feature vectorthat encodes both scene context and the temporal history in the observedframe. A framework that explicitly models the causal relationshipbetween the driver intention, scenario, and decision of driver maneuveris provided herein. The system for driver behavior risk assessment andpedestrian awareness may examine the problem of risk perception andintroduce a new dataset to facilitate research in this domain. Thedataset may include short video clips or image sequences that includeannotations of driver intent, road network topology, situation (e.g.,crossing pedestrian), driver response, and pedestrian attentivenessusing face annotations and facial bounding boxes. With the dataset, thesystem may introduce a novel risk object identification (ROI) frameworkthat models the causal relationship of driver intention, situation, anddriver's response, thereby enabling causal influence (e.g., cause ofdriver response or reaction (i.e., one of the traffic participants, oneor the environment features, etc.)) to be determined.

For pedestrian attentiveness, the system may provide both insight fromboth classification and detection perspectives. According to one aspect,driver behavior risk assessment and pedestrian awareness may utilize alarge scale dataset with pedestrian face annotations in the drivingscene to gauge pedestrian attentiveness within the context of riskperception and to provide a detection framework using faces of thepedestrians.

In this regard, advantages or benefits of the driver behavior riskassessment and pedestrian awareness described herein may includeaddressing the limitations of existing datasets by introducing a noveland comprehensive dataset with a diverse set of situations andannotations to enable research for risk object identification.Additionally, a framework for risk object identification may be providedwhich models the relationship between driver intention (e.g., where doesthe driver wish to go?), situation (e.g., reasoning, surroundings,position of traffic participants, directions traffic participants aremoving, interaction between ego-vehicle and traffic participants,influence based on other traffic participants, etc.), and the driverresponse (e.g., continue, stop, slow, turn, etc.). Further, the systemmay provide annotations for pedestrian attentiveness on a subset of theproposed dataset to enable risk object identification, and provide aframework for pedestrian attention detection using faces.

According to one aspect, the dataset may include an image sequence orstream of images captured from video cameras, which may include LiDARsensors and/or GPS sensors. Additionally, vehicle Controller AreaNetwork (CAN) data may be collected for analyzing how drivers manipulatesteering, breaking, and throttle in conjunction with the image sequenceor stream of images. This sensor data may be synchronized andtimestamped. Further, the data may capture a diverse set of trafficscenes including different traffic environments such as urban, suburban,and highway environments, for example. Pedestrian awareness data may befocused on intersection scenarios where diverse interactions betweendrivers and pedestrians are present.

The system for driver behavior risk assessment and pedestrian awarenessmay be utilized to develop robust intelligent driving systems which maybe implemented using one or more vehicle systems, as described above(e.g., an autonomous driving system, an electronic stability controlsystem, an anti-lock brake system, a brake assist system, an automaticbrake prefill system, a low speed follow system, a cruise controlsystem, a collision warning system, a collision mitigation brakingsystem, an auto cruise control system, a lane departure warning system,a blind spot indicator system, a lane keep assist system, a navigationsystem, a transmission system, brake pedal systems, an electronic powersteering system, visual devices, camera systems, proximity sensorsystems, a climate control system, an electronic pretensioning system, amonitoring system, a passenger detection system, a vehicle suspensionsystem, a vehicle seat configuration system, a vehicle cabin lightingsystem, an audio system, a sensory system, among others), such as by,for example, implementing one or more of the aforementioned vehiclesystems based on the driver behavior risk assessment and pedestrianawareness or risk score, etc.

The dataset may include data which is categorized or manuallycategorized, and include a driver intention, a road topology, asituation, a decision of a driver, and a pedestrian awareness for eachclip of data. According to one aspect, automatic situation localizationin untrimmed videos may be explored using the proposed dataset.

FIG. 1 is an exemplary component diagram of a system 100 for driverbehavior risk assessment and pedestrian awareness, according to oneaspect. The system 100 may implement a four-layer representation, (i.e.,driver intention, topology, situation, driver's response), to describedriver behavior for risk assessment. Specifically, the labelingstructure may be designed for risk assessment. According to one aspect,intention of the ego vehicle or the driver intention may be estimated asa left-turn intention, a right-turn intention, or a straight-travelintention. Therefore, the system 100 for driver behavior risk assessmentand pedestrian awareness as described herein does not requireforecasting of any trajectories of any vehicles, pedestrians, trafficparticipants, etc. because the risk assessment is calculated instead. Inthis way, no trajectory forecasting is required whatsoever and modelingof the relationship between intentions, situation, and driver responsemay be provided. This explicit modeling facilitates identification ofwho may be an influence on the driver's behavior and who may bedetermined as risky or riskier as a traffic participant.

Drivers may be perceptive of the road topology and situation of scene aspart of their planning and decision making. In this regard, theunderlying road topology network may be annotated in a topology layerincluding a straight topology, a three-way intersection topology, or afour-way intersection topology.

While navigating toward a goal (e.g., reach an intersection) via a roadtopology network, a driver or an ego-vehicle may encounter differentdriving situations or react to certain road users or trafficparticipants (e.g., a bicyclist is crossing the street, a truck parkednear the ego-lane, etc.). In this regard, a road user that directlyimpacts driver behavior may be annotated in a situation layer within thedataset. Examples of different types of situations may include a stopsign, a traffic light, a crossing pedestrian, a crossing vehicle, avehicle blocking ego lane, a congestion, a jaywalking, a vehicle backinginto parking space, a vehicle on shoulder open door, or a cut-in, etc.

The response of driver to the road users may be labeled in the driver'sresponse layer of the dataset. According to one aspect, two types ofdecisions are annotated (i.e., influenced and non-influenced). Examplesof influenced may include deviating from parked vehicle, yielding tocrossing pedestrian, or stopping for stop sign, etc.

With regard to pedestrian attentiveness, the dataset may focus onannotations relating to the attention of pedestrians while theego-vehicle is approaching (e.g., within a threshold distance, etc.). Inother words, for the pedestrian attentiveness, the system 100 may selecta subset of scenes from the dataset used for the risk objectidentification, so the subset includes scenes that the driver isinfluenced by pedestrians. Further, the pedestrian attentiveness portionof the dataset may include pedestrian attentiveness labels, (i.e.,looking, not looking, and not sure) and may further include mutualawareness labels relating to driver monitoring and gaze information(e.g., whether the driver and the pedestrian and likely to be aware ofone another). Further, pedestrian attentiveness portion of the datasetmay include labels or bounding boxes and occlusion flags aroundpedestrian faces as well as pedestrian bodies. Therefore, the datasetenables reasoning or inferences associated with pedestrian attentivenessto be made from both faces and bodies instead of purely using bodyposes. According to one aspect, pedestrians may be considered if thepedestrian has a height greater than a threshold number of pixels forthe pedestrian. Similarly, the pedestrian face may be considered if thefacial bounding box is greater than a threshold number of pixels for theface. Occlusion flags (e.g., partially occluded, fully occluded,non-occluded, etc.) may be set for facial bounding boxes, pedestrianbody bounding boxes, and/or pedestrian bounding boxes.

According to one aspect, the system 100 may be setup to formulate therisk object identification problem as a cause and effect problemaccording to the framework or architecture of FIG. 1 or the framework orarchitecture of FIG. 2 . FIG. 2 is an exemplary component diagram of asystem 200 for driver behavior risk assessment and pedestrian awareness,according to one aspect.

In this regard, the system 100 for driver behavior risk assessment andpedestrian awareness of FIG. 1 and/or the system 200 for driver behaviorrisk assessment and pedestrian awareness of FIG. 2 may include aprocessor 102, a memory 104, a storage device 106 and/or database, animage sensor 110, a risk object identifier (ROI) 120, an intentionestimator 130, a scene representation generator 140, a situationpredictor 150, a driver response determiner 160, one or more busesinterconnecting respective components. One or more of the ROI 120, theintention estimator 130, the scene representation generator 140, thesituation predictor 150, or the driver response determiner 160 may beimplemented via the processor 102, the memory 104, and/or the storagedevice 106. Further, the annotated dataset described above may be storedon the database or the storage device 106 or may be stored in a remotethird party server. In any event, the annotated dataset may be utilizedto train one or more of the ROI 120, the intention estimator 130, thescene representation generator 140, the situation predictor 150, or thedriver response determiner 160.

The image sensor 110 may be an image capture device, such as a videocamera, and may receiving an input stream of images of an environmentincluding one or more objects (e.g., pedestrians, road users, etc.)within the environment.

According to one aspect, node features may be obtained using a MaskR-CNN pre-trained on a dataset (e.g., COCO dataset) and DeepSORT may beapplied to detect and track respective objects. To identify objectinfluencing driver behavior from an ego-centric view, the system 100 mayconstruct an ego-centric spatio-temporal graph that models how roadusers influence the ego-vehicle using graph based reasoning. In thisway, a compositional framework may be provided to determine whether thedriver is influenced.

The ROI 120 may extract image-level and/or object-level features fromthe input stream of images of the environment. The ROI 120 may determineone or more object bounding boxes for one or more of the objects withinthe environment. One or more of the object bounding boxes may be arounda face or a head of a pedestrian and the ROI 120 may determine whetherthe pedestrian is looking or is not looking at the ego vehicle. In thisway, given a sequence of video frames observed, the framework of thesystem 100 of FIG. 1 or the system 200 of FIG. 2 may extract image-leveland object-level features for objects of interest.

According to one aspect, RolAlign may be used to extract thecorresponding object-level representation. The ego node feature, (i.e.representation of the ego-vehicle), may be obtained similarly using aframe size bounding box. This also enables the capture of the scenecontext. Driving scenes are complicated, and not all objects in thescene influence the driver. Therefore, the system 100 or 200 may limitthe objects of interest to the following classes: person, bicycle, car,motorcycle, bus, truck, traffic light, and stop sign. Additionally, thesystem 100 or 200 may use one or more partial convolution layers tosimulate a situation without the presence of an object.

The intention estimator 130 may estimate an intention of an ego vehiclebased on the input stream of images and a temporal recurrent network(TRN). The scene representation generator 140 may generate a scenerepresentation based on the input stream of images and a graph neuralnetwork (GNN). Additionally, the scene representation generator 140 mayconstruct an ego-centric spatio-temporal graph using extractedimage-level and object-level features from the objects of interest asthe representation of the various nodes in the graph. The system 100 or200 may utilize scene representation to highlight the effect of modelingcausal relationship between the driver intention, situation, anddecision of driver maneuver. By connecting objects to the GNN, thisenables the scene representation generator 140 to model the relationshipbetween relevant traffic participants or features of the environment(e.g., stop signs, traffic lights, etc.), and thus, model the situation.Explained another way, the scene representation generator 140 may modelthe relationship of multiple other traffic participants and environmentfeatures with respect to the ego-vehicle over a series of timestamps,based on the GNN and the TRN. In this way, the system 100 or 200 may usecausal influence to remove or mask an object or a feature from theimages.

Specifically, once the node features are extracted, the system 100 or200 may model interactions between the ego and objects via a messagepassing mechanism. To incorporate the temporal history in communicationwithin the graph, the system 100 or 200 may model the ego and objects'temporal dynamics using a LSTM module or the scene representationgenerator 140.

The situation predictor 150 may generate a prediction of a situationbased on the scene representation and the intention of the ego vehicle.According to one aspect, the situation predictor 150 may generate theprediction of the situation based on an element-wise dot product of thescene representation and the intention of the ego vehicle. The situationpredictor 150 may classify the prediction of the situation into a binaryclass. To identify object influencing driver behavior from anego-centric view, the scene representation generator 140 of the system100 or 200 may construct an ego-centric spatio-temporal graph thatmodels how road users influence the ego-vehicle using graph basedreasoning.

The driver response determiner 160 may generate an influenced ornon-influenced action determination based on the prediction of thesituation and the scene representation and a risk score for one or moreof the objects, traffic participants, or environment features based onthe influenced or non-influenced action determination, the prediction ofthe situation, and/or the scene representation. According to one aspect,the driver response determiner 160 may generate the influenced ornon-influenced action determination based on passing the prediction ofthe situation through a multilayer perceptron (MLP) and the scenerepresentation.

In this way, the system 100 of FIG. 1 or the system 200 of FIG. 2 maymodel the causal relationship between the driver intention, situation,and decision of driver maneuver. The network architecture or system 100or 200 may take as input, a sequence of RGB frames, a sequence of binarymasks for partial convolution, and a set of object tracklets. Theseinputs may be passed on to the graph neural network (GNN) and TRN forobtaining the scene representation and the driver intentionrepresentation, respectively, which may be further concatenated topredict the situation. The logits from the situation classifier mayfurther be categorized into binary classes, situation (s) and background(1−s) and may be passed through an MLP to obtain a refinedrepresentation, which may be concatenated with the graph based scenerepresentation to predict the driver decision and determined to beeither Influenced or Non-influenced.

FIG. 3 is an exemplary diagram of factors associated with driverbehavior risk assessment and pedestrian awareness, according to oneaspect. According to one aspect, the system 100 or 200 may be modeledbased on the causal relationship shown in FIG. 3 , including the driverintention 302, the situation 304, and the driver response 306. It may benoted that the driver intention does not necessarily directly affect thedecision of the driver (e.g., driver decision), as the driver may alterthe course based on the traffic agent, (i.e. situation), irrespective ofwhat the driver intention may be. However, the driver intention mayindirectly affect the driver decision through the situation.

FIG. 4 is an exemplary diagram 400 of external stimuli associated withdriver behavior risk assessment and pedestrian awareness, according toone aspect. Pedestrian attentiveness may play a significant role in theperception of risk because it involves the mutual communication andintention understanding between drivers and road users or pedestrians,which may be utilized to model their respective interactions. As shownin FIG. 4 , the joint attention between the ego-driver 402 andpedestrian 404 waiting to cross (i.e., pedestrian intention) forms anon-verbal communication channel which mitigates uncertainty andpromotes mutual awareness between the driver and pedestrian.

The system 100 or 200 may divide the high-level modeling of decisionmaking into specific components in order to solve tasks, associated withtraffic scenarios, such as the traffic scenario of FIG. 4 . In thecontext of driving scene, when the ego-driver 402 needs to make adecision to either alter from current course due to a traffic agent 406,there may be a specific order of events which the driver takes intoaccount. As shown in FIG. 4 , when a driver approaches an intersection,there are often multiple paths the driver may take. Based on the driverintention, the immediate destination may be first fixed, which in thiscase is to turn right. Then, when the intended path is decided, thesecond step may be to ascertain whether there is traffic agent orsituation on the intended path that might cause the driver to alter thecurrent driving behavior. Assuming the driver intends to turn right, thevehicle crossing on the left becomes irrelevant. Finally, if thereindeed is such an agent, then the driver decision is affected. Asillustrated in FIG. 4 , the pedestrian that is about to cross isdirectly in the future path of the ego-vehicle. Thus, the expectedresponse from the driver is to proceed slowly and yield to thepedestrian.

With respect to the system 100 of FIG. 1 or the system 200 of FIG. 2 ,in order to model the above relationship, the system 100 or 200 mayfirst incorporate a network to predict the driver intention. The featurerepresentation learned through this network may then be concatenatedwith the graph based representation of the scene, followed by aclassifier, to predict the situation influencing the driver. To predictthe driver's response (e.g., influenced or non-influenced), theindication of the presence of a situation is enough, as no matterwhether the situation is jaywalking or stop sign, the driver would alterfrom current course. Therefore, the system 100 or 200 may modify thelogits of the scenario classifier in a binary manner (e.g., situation orbackground) to indicate the presence of an obstacle or an object. Theselogits may then be passed through a multi-layer perceptron (MLP) andconcatenated with the graph representation to predict the driverdecision. The system 100 or 200 may use the same graph representationfor both situation and driver decision to capture the essence that thesame node in the graph (e.g., traffic agent) is responsible for boththese tasks. The system 100 or 200 may optimize the network using thefollowing multi-task loss function:

_(roi)=λ₁

_(i)+λ₂

_(s)+λ₃

_(d)  (1)

where

_(i),

_(s),

_(d) are losses corresponding to driver intention, situation, and driverdecision, respectively, and λ₁, λ₂, λ₃ are loss balancing parameters.

With regard to pedestrian attentiveness, the annotated dataset mayprovide a bounding box around both the faces and body of pedestrians andthe system 100 or 200 may use these annotations to train one or more ofthe ROI 120, the intention estimator 130, the scene representationgenerator 140, the situation predictor 150, or the driver responsedeterminer 160 from a classification perspective and a detectionperspective.

With regard to the classification, the system 100 or 200 may traincropped images of pedestrians and their faces separately (e.g., withminor occlusions up to a threshold amount) on a model (e.g., ResNet-101model) and thus demonstrate the advantage of face annotations throughthe aforementioned annotated dataset.

With regard to the detection, the system 100 or 200 may modify a facedetector by adding a separate head for estimating pedestrian attentionin parallel with existing box classification and regression branches.The system 100 or 200 may detect a face in the scene and classify theface. More specifically, for any training anchor i, the system 100 or200 may minimize the following multi-task loss function:

_(p)=

_(cls)(p _(i) ,p _(i)*)+

_(box)(t _(i) ,t _(i)*)+α

_(attn)(a _(i) ,a _(i)*)  (2)

where

_(cls) and

_(box) are face classification and box regression losses similar to,

_(attn) is the loss for the attention head and α is used to balance anattention loss. The system 100 or 200 may use a cross entropy loss for

_(attn) where a_(i) the predicted probability of anchor i correspondingto looking, and is non-zero if anchor i is a positive anchor, i.e., hasan overlap with the ground truth face box above a threshold γ.Correspondingly, a_(i)* is 1 when the label is Looking and 0 if a_(i)*is Not Looking. In this way, the system 100 or 200 may use a croppedportion to classify whether a pedestrian is looking or not looking atthe ego-vehicle.

Although in driver behavior risk assessment and pedestrian awareness thesystem 100 or 200 may focus on instantaneous pedestrian attention, thelabels may be modified to extend the pedestrian attention problem overtime by converting the task to action start detection where the goal isto identify the starting point of an action.

FIG. 5 is an exemplary flow diagram of a method 500 for driver behaviorrisk assessment and pedestrian awareness, according to one aspect. Themethod for driver behavior risk assessment and pedestrian awareness mayinclude receiving 502 an input stream of images of an environment (e.g.,a straight topology, a three-way intersection topology, or a four-wayintersection topology) including one or more objects within theenvironment, estimating 504 an intention of an ego vehicle (e.g., aleft-turn intention, a right-turn intention, or a straight-travelintention) based on the input stream of images and a temporal recurrentnetwork (TRN), generating 506 a scene representation based on the inputstream of images and a graph neural network (GNN), generating 508 aprediction of a situation (e.g., a stop sign, a traffic light, acrossing pedestrian, a crossing vehicle, a vehicle blocking ego lane, acongestion, a jaywalking, a vehicle backing into parking space, avehicle on shoulder open door, or a cut-in) based on the scenerepresentation and the intention of the ego vehicle, and generating 510an influenced or non-influenced action determination based on theprediction of the situation and the scene representation.

The method 500 for driver behavior risk assessment and pedestrianawareness may include extracting image-level and object-level featuresfrom the input stream of images of the environment, determining one ormore object bounding boxes for one or more of the objects within theenvironment, determining whether a pedestrian is looking or is notlooking at the ego vehicle, and one or more of the object bounding boxesmay be around a face or a head of the pedestrian, generating theprediction of the situation based on an element-wise dot product of thescene representation and the intention of the ego vehicle, andgenerating the influenced or non-influenced action determination basedon passing the prediction of the situation through a multilayerperceptron (MLP) and the scene representation.

Still another aspect involves a computer-readable medium includingprocessor-executable instructions configured to implement one aspect ofthe techniques presented herein. An aspect of a computer-readable mediumor a computer-readable device devised in these ways is illustrated inFIG. 6 , wherein an implementation 600 includes a computer-readablemedium 608, such as a CD-R, DVD-R, flash drive, a platter of a hard diskdrive, etc., on which is encoded computer-readable data 606. Thisencoded computer-readable data 606, such as binary data including aplurality of zero's and one's as shown in 606, in turn includes a set ofprocessor-executable computer instructions 604 configured to operateaccording to one or more of the principles set forth herein. In thisimplementation 600, the processor-executable computer instructions 604may be configured to perform a method 602, such as the method 500 ofFIG. 5 . In another aspect, the processor-executable computerinstructions 604 may be configured to implement a system, such as thesystem 100 of FIG. 1 or the system 200 of FIG. 2 . Many suchcomputer-readable media may be devised by those of ordinary skill in theart that are configured to operate in accordance with the techniquespresented herein.

As used in this application, the terms “component”, “module,” “system”,“interface”, and the like are generally intended to refer to acomputer-related entity, either hardware, a combination of hardware andsoftware, software, or software in execution. For example, a componentmay be, but is not limited to being, a process running on a processor, aprocessing unit, an object, an executable, a thread of execution, aprogram, or a computer. By way of illustration, both an applicationrunning on a controller and the controller may be a component. One ormore components residing within a process or thread of execution and acomponent may be localized on one computer or distributed between two ormore computers.

Further, the claimed subject matter is implemented as a method,apparatus, or article of manufacture using standard programming orengineering techniques to produce software, firmware, hardware, or anycombination thereof to control a computer to implement the disclosedsubject matter. The term “article of manufacture” as used herein isintended to encompass a computer program accessible from anycomputer-readable device, carrier, or media. Of course, manymodifications may be made to this configuration without departing fromthe scope or spirit of the claimed subject matter.

FIG. 7 and the following discussion provide a description of a suitablecomputing environment to implement aspects of one or more of theprovisions set forth herein. The operating environment of FIG. 7 ismerely one example of a suitable operating environment and is notintended to suggest any limitation as to the scope of use orfunctionality of the operating environment. Example computing devicesinclude, but are not limited to, personal computers, server computers,hand-held or laptop devices, mobile devices, such as mobile phones,Personal Digital Assistants (PDAs), media players, and the like,multiprocessor systems, consumer electronics, mini computers, mainframecomputers, distributed computing environments that include any of theabove systems or devices, etc.

Generally, aspects are described in the general context of “computerreadable instructions” being executed by one or more computing devices.Computer readable instructions may be distributed via computer readablemedia as will be discussed below. Computer readable instructions may beimplemented as program modules, such as functions, objects, ApplicationProgramming Interfaces (APIs), data structures, and the like, thatperform one or more tasks or implement one or more abstract data types.Typically, the functionality of the computer readable instructions arecombined or distributed as desired in various environments.

FIG. 7 illustrates a system 700 including a computing device 712configured to implement one aspect provided herein. In oneconfiguration, the computing device 712 includes at least one processingunit 716 and memory 718. Depending on the exact configuration and typeof computing device, memory 718 may be volatile, such as RAM,non-volatile, such as ROM, flash memory, etc., or a combination of thetwo. This configuration is illustrated in FIG. 7 by dashed line 714.

In other aspects, the computing device 712 includes additional featuresor functionality. For example, the computing device 712 may includeadditional storage such as removable storage or non-removable storage,including, but not limited to, magnetic storage, optical storage, etc.Such additional storage is illustrated in FIG. 7 by storage 720. In oneaspect, computer readable instructions to implement one aspect providedherein are in storage 720. Storage 720 may store other computer readableinstructions to implement an operating system, an application program,etc. Computer readable instructions may be loaded in memory 718 forexecution by the at least one processing unit 716, for example.

The term “computer readable media” as used herein includes computerstorage media. Computer storage media includes volatile and nonvolatile,removable and non-removable media implemented in any method ortechnology for storage of information such as computer readableinstructions or other data. Memory 718 and storage 720 are examples ofcomputer storage media. Computer storage media includes, but is notlimited to, RAM, ROM, EEPROM, flash memory or other memory technology,CD-ROM, Digital Versatile Disks (DVDs) or other optical storage,magnetic cassettes, magnetic tape, magnetic disk storage or othermagnetic storage devices, or any other medium which may be used to storethe desired information and which may be accessed by the computingdevice 712. Any such computer storage media is part of the computingdevice 712.

The term “computer readable media” includes communication media.Communication media typically embodies computer readable instructions orother data in a “modulated data signal” such as a carrier wave or othertransport mechanism and includes any information delivery media. Theterm “modulated data signal” includes a signal that has one or more ofits characteristics set or changed in such a manner as to encodeinformation in the signal.

The computing device 712 includes input device(s) 724 such as keyboard,mouse, pen, voice input device, touch input device, infrared cameras,video input devices, or any other input device. Output device(s) 722such as one or more displays, speakers, printers, or any other outputdevice may be included with the computing device 712. Input device(s)724 and output device(s) 722 may be connected to the computing device712 via a wired connection, wireless connection, or any combinationthereof. In one aspect, an input device or an output device from anothercomputing device may be used as input device(s) 724 or output device(s)722 for the computing device 712. The computing device 712 may includecommunication connection(s) 727 to facilitate communications with one ormore other devices 730, such as through network 728, for example.

Although the subject matter has been described in language specific tostructural features or methodological acts, it is to be understood thatthe subject matter of the appended claims is not necessarily limited tothe specific features or acts described above. Rather, the specificfeatures and acts described above are disclosed as example aspects.

Various operations of aspects are provided herein. The order in whichone or more or all of the operations are described should not beconstrued as to imply that these operations are necessarily orderdependent. Alternative ordering will be appreciated based on thisdescription. Further, not all operations may necessarily be present ineach aspect provided herein.

As used in this application, “or” is intended to mean an inclusive “or”rather than an exclusive “or”. Further, an inclusive “or” may includeany combination thereof (e.g., A, B, or any combination thereof). Inaddition, “a” and “an” as used in this application are generallyconstrued to mean “one or more” unless specified otherwise or clear fromcontext to be directed to a singular form. Additionally, at least one ofA and B and/or the like generally means A or B or both A and B. Further,to the extent that “includes”, “having”, “has”, “with”, or variantsthereof are used in either the detailed description or the claims, suchterms are intended to be inclusive in a manner similar to the term“comprising”.

Further, unless specified otherwise, “first”, “second”, or the like arenot intended to imply a temporal aspect, a spatial aspect, an ordering,etc. Rather, such terms are merely used as identifiers, names, etc. forfeatures, elements, items, etc. For example, a first channel and asecond channel generally correspond to channel A and channel B or twodifferent or two identical channels or the same channel. Additionally,“comprising”, “comprises”, “including”, “includes”, or the likegenerally means comprising or including, but not limited to.

It will be appreciated that various of the above-disclosed and otherfeatures and functions, or alternatives or varieties thereof, may bedesirably combined into many other different systems or applications.Also that various presently unforeseen or unanticipated alternatives,modifications, variations or improvements therein may be subsequentlymade by those skilled in the art which are also intended to beencompassed by the following claims.

1. A system for driver behavior risk assessment and pedestrianawareness, comprising: an intention estimator estimating an intention ofan ego vehicle based on an input stream of images of an environmentincluding one or more objects within the environment and a first neuralnetwork; a scene representation generator generating a scenerepresentation based on the input stream of images and a second neuralnetwork; a situation predictor generating a prediction of a situationbased on the scene representation and the intention of the ego vehicle;and a driver response determiner generating an influenced ornon-influenced action determination based on the prediction of thesituation and the scene representation.
 2. The system for driverbehavior risk assessment and pedestrian awareness of claim 1, whereinthe intention of the ego vehicle is estimated as a left-turn intention,a right-turn intention, or a straight-travel intention.
 3. The systemfor driver behavior risk assessment and pedestrian awareness of claim 1,wherein the environment includes a straight topology, a three-wayintersection topology, or a four-way intersection topology.
 4. Thesystem for driver behavior risk assessment and pedestrian awareness ofclaim 1, wherein the situation includes a stop sign, a traffic light, acrossing pedestrian, a crossing vehicle, a vehicle blocking ego lane, acongestion, a jaywalking, a vehicle backing into parking space, avehicle on shoulder open door, or a cut-in.
 5. The system for driverbehavior risk assessment and pedestrian awareness of claim 1, comprisinga risk object identifier (ROI) extracting image-level and object-levelfeatures from the input stream of images of the environment.
 6. Thesystem for driver behavior risk assessment and pedestrian awareness ofclaim 5, wherein the ROI determines one or more object bounding boxesfor one or more of the objects within the environment.
 7. The system fordriver behavior risk assessment and pedestrian awareness of claim 6,wherein one or more of the object bounding boxes are around a face or ahead of a pedestrian and the ROI determines whether the pedestrian islooking or is not looking at the ego vehicle.
 8. The system for driverbehavior risk assessment and pedestrian awareness of claim 1, whereinthe situation predictor generates the prediction of the situation basedon an element-wise dot product of the scene representation and theintention of the ego vehicle.
 9. The system for driver behavior riskassessment and pedestrian awareness of claim 1, wherein the driverresponse determiner generates the influenced or non-influenced actiondetermination based on passing the prediction of the situation through amultilayer perceptron (MLP) and the scene representation.
 10. The systemfor driver behavior risk assessment and pedestrian awareness of claim 1,wherein the situation predictor classifies the prediction of thesituation into a binary class.
 11. A method for driver behavior riskassessment and pedestrian awareness, comprising: estimating an intentionof an ego vehicle based on an input stream of images of an environmentincluding one or more objects within the environment and a first neuralnetwork; generating a scene representation based on the input stream ofimages and a second neural network; generating a prediction of asituation based on the scene representation and the intention of the egovehicle; and generating an influenced or non-influenced actiondetermination based on the prediction of the situation and the scenerepresentation.
 12. The method for driver behavior risk assessment andpedestrian awareness of claim 11, wherein the intention of the egovehicle is estimated as a left-turn intention, a right-turn intention,or a straight-travel intention.
 13. The method for driver behavior riskassessment and pedestrian awareness of claim 11, wherein the environmentincludes a straight topology, a three-way intersection topology, or afour-way intersection topology.
 14. The method for driver behavior riskassessment and pedestrian awareness of claim 11, wherein the situationincludes a stop sign, a traffic light, a crossing pedestrian, a crossingvehicle, a vehicle blocking ego lane, a congestion, a jaywalking, avehicle backing into parking space, a vehicle on shoulder open door, ora cut-in.
 15. The method for driver behavior risk assessment andpedestrian awareness of claim 11, comprising extracting image-level andobject-level features from the input stream of images of theenvironment.
 16. The method for driver behavior risk assessment andpedestrian awareness of claim 15, comprising determining one or moreobject bounding boxes for one or more of the objects within theenvironment.
 17. The method for driver behavior risk assessment andpedestrian awareness of claim 16, comprising determining whether apedestrian is looking or is not looking at the ego vehicle, wherein oneor more of the object bounding boxes are around a face or a head of thepedestrian.
 18. The method for driver behavior risk assessment andpedestrian awareness of claim 11, comprising generating the predictionof the situation based on an element-wise dot product of the scenerepresentation and the intention of the ego vehicle.
 19. The method fordriver behavior risk assessment and pedestrian awareness of claim 11,comprising generating the influenced or non-influenced actiondetermination based on passing the prediction of the situation through amultilayer perceptron (MLP) and the scene representation.
 20. A driverbehavior risk assessment and pedestrian awareness vehicle, comprising:an intention estimator estimating an intention of the vehicle based onan input stream of images of an environment including one or moreobjects within the environment and a first neural network; a scenerepresentation generator generating a scene representation based on theinput stream of images and a second neural network; a situationpredictor generating a prediction of a situation based on the scenerepresentation and the intention of the vehicle; and a driver responsedeterminer generating an influenced or non-influenced actiondetermination based on the prediction of the situation and the scenerepresentation.